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Abstract 

This paper develops a Bayesian network-based method for the calibration of multi-physics 
models, integrating various sources of uncertainty with information from computational 
models and experimental data. We adopt the well-known Kennedy and O'Hagan (KOH) 
framework for model calibration under uncertainty, and develop extensions to multi-physics 
models and various scenarios of available data. Both aleatoric uncertainty (due to natural 
variability) and epistemic uncertainty (due to lack of information, including data uncertainty 
and model uncertainty) are accounted for in the calibration process. Challenging aspects 
of Bayesian calibration for multi-physics models are investigated, including: (1) calibration 
with different forms of experimental data (e.g., interval data and time series data), (2) 
determination of the identifiability of model parameters when the analytical expression of 
model is known or unknown, (3) calibration of multiple physics models sharing common 
parameters, and (4) efficient use of available data in a multi-model calibration problem 
especially when the experimental resources are limited. A first-order Taylor series expansion- 
based method is proposed to determine which model parameters are identifiable, i.e., to 
find the parameters that can be calibrated with the available data. Following the KOH 
framework, a probabilistic discrepancy function is estimated and added to the prediction 
of the calibrated model, attempting to account for model uncertainty. This discrepancy 
function is modeled as a Gaussian process when sufficient data are available for multiple 
model input combinations, and is modeled as a random variable when the available data 



set is small and limited. The overall approach is illustrated using two application examples 
related to microelectromechanical system (MEMS) devices: (1) calibration of a dielectric 
charging model with time-series data, and (2) calibration of two physics models (pull-in 

voltage and creep) using measurements of different physical quantities in different devices. 
Keywords: Model calibration, interval data, time series data, identifiability, Bayesian 
network, multi-physics 



1. Introduction 

Stochastic multi-physics simulation is a key component in the reliability analysis of en- 
gineering components/devices, which requires solving several computational models while 
accounting for various sources of uncertainty. Calibration of these multi-physics computa- 
tional models can be challenging due to the complex structure of the system, existence of 
multiple uncertainty sources, and limited experimental data. 

Model calibration can be viewed as the process of adjusting the value or the prior dis- 
tribution of unknown model parameters in order to improve the agreement between the 
model output and observed data [IH3]. In comparison to other calibration methods (such 
as maximum likelihood and least squares) which return point /interval estimates, Bayesian 
inference returns the posterior probability density functions (PDF) of unknown parameters. 
These posterior PDFs account for various sources of uncertainty existing in the computer 
model and experimental observation, including natural variability in model inputs, data un- 
certainty (measurement uncertainty and epistemic uncertainty due to insufficient data), and 
model uncertainty [I]. 

Kennedy and O'Hagan [TJ developed a Bayesian framework (commonly known as the 
Kennedy and O'Hagan (KOH) framework) for the calibration of computer models under 
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various sources of uncertainty, and a discrepancy function was introduced in order to account 
for the discrepancy between observed data and the calibrated model prediction. In addition 
to the KOH framework, significant research efforts have been devoted to the development of 
Bayesian methods for scientific and engineering applications [oTflO]. However, several issues 
remain unclear in the implementation of Bayesian calibration for practical problems with 
complicated systems of models: (1) calibration with different types of available data, such as 
point data, interval data, and time series data, (2) identifiability of model parameters, i.e., 
how to find out which parameters can or cannot be calibrated using the Bayesian approach 
for a given computer model, (3) calibration of multiple models sharing common parameters, 
and (4) efficient use of experimental data in calibration, which may be useful for the cases 
that only a limited amount of data are available. 

Aimed at providing potentially useful directions for solving the above issues, this paper 
develops a Bayesian network-based calibration approach for multi-physics computational 
models. The Bayesian network is a powerful tool to represent complicated systems with a 
set of nodes and the probabilistic relation between the nodes [TTHT3] . and the observation 
data of some of the nodes can be conveniently incorporated into the network to facilitate the 
inference on other nodes. Based on the information contained in the Bayesian network and 
the observation data, model parameters can be calibrated accounting for different sources 
of uncertainty, and the posterior PDFs of the parameters can be obtained. Note that 
this paper focuses on model calibration with direct measurement data of the model output 
variable. In the case that the available information are the moments of the probability 
distributions of the model output variable, some recently developed methodologies based 
on optimization with constraints on the moments may be considered [HI [15] . In addition, 
a Bayesian approach has been developed to include the information on the moments of 
unknown model parameters |16j . 

In this paper, we first present the basic framework of model calibration using the Bayesian 
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network in Section [2j In the subsequent sections, practical issues in the application of 



Bayesian calibration are discussed. In Section |3.1[ two types of experimental data are 
considered, namely interval data and time series data, and the corresponding details of 
calibration are developed. For models with multiple parameters, it is possible that not all 
of the parameters can be calibrated due to the inner structure of the model or the amount 
of the available experimental data. Knowing which parameters are unidentifiable can save 
computational effort. A first-order Taylor series expansion-based method is developed for 



this purpose in Section 3.2 Some discussions on computing likelihood functions, which 



can be computationally expensive for complex systems, are provided in Section |3.3| A 
Bayesian network-based method is developed in Section [4] for multi-physics computational 
models, which efficiently uses the available experimental data in model calibration. Section [5] 
illustrates the aforementioned methods for the calibration of (1) a dielectric charging model 
with time series data, and (2) a multi-physics modeling system for MEMS devices, which 
includes multiple interacting computational models (dynamic, electrostatic, damping, and 
creep models). 

2. Basics of model calibration using a Bayesian network 

Consider a computer model with inputs x, parameters 6, and output y m (as shown in 



Fig. 1(a)) 



y m = G(x;0) (1) 

This model is constructed to predict a physical quantity y, which is observable through ex- 
periments, i.e., the model output y m is a prediction of the actual quantity y. The distinction 
between the model inputs x and parameters 6 may not always be obvious. In this paper, we 
consider the model inputs as observable quantities, and are represented with specified deter- 
ministic values or probability distributions (if stochastic). In contrast, the model parameters 

are not observable, and the objective of model calibration is to estimate these parameters 
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based on available information. Although both the terms "experimental inputs" and "model 
inputs" refer to the same physical quantities, it should be noted that the term "experimental 
inputs" (denoted as Xd) is used to represent the measurement of these physical quantities in 
calibration experiments, whereas the term "model inputs" refers to the set of values of these 
quantities that go into the model. In the process of model calibration, the model inputs x 
are set to be Xd, since the model will only be evaluated at Xp. 

Suppose the measurement uncertainty is represented using a zero-mean Guassian random 
variable e b s with variance o 2 ohs . Following the KOH framework, the model uncertainty 
is represented by a model discrepancy function, which is approximated using a Gaussian 
process, i.e., S ~ J\f(m{x\ (f>), k(x, x'; 93)); where m(*) is the mean function of this Gaussian 
process S and (p is the set of coefficients of m(*); k(*) is the covariance function of S and 
ip is the set of coefficients of k(*). The choice of the form of the mean function can be 
rather subjective and is typically problem-specific. One such choice will be demonstrated 



in the numerical example in Section 5.1 The choice of covariance function may also be 



problem-specific, but a squared exponential function is commonly used for illustration, i.e., 

k(x,x';cp) = Xe^(-j2 iXi ~ p X,i}2 ) (2) 

i=i % 

where <p = [A, Zi, Z2, ■■■,lq], and q is the dimension of the inputs x\ A is the variance of this 
Gaussian process; U is the length-scale parameter corresponding to the input x,. Higher 
values of k indicate higher statistical correlation between Xi and x\. Note that if the KOH 
framework is strictly followed, the computer model Gix\ 6) will also be approximated using 
a Gaussian process in order to reduce the computational effort. 

Since a a b s , 4> and (f are usually unknown, they may also need to be calibrated. If some 
observation data (denoted as D) of y are available, we can calibrate the unknown parameters 
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0, a obs , (f) an d f using Bayes' theorem as 

C(0, (T obs , <fi, ifi)'K(0)-K(a obs )-K((f))'K(ifi) 



Tr(6,a obs: (f),cp\D) 



f C(0, a obs , <fi, (p)n(O)n((r obs )7i((l))7i(ip)d0da obs d<pd(p 
C(0,a obs ,(j>,(p) oc ir(D\0,a obs ,(j),(p) (3) 

where n(0), Ti(cr obs ), n(<p) and ir((p) are the prior PDFs of 0, a obs , <fi and cp respectively, 
representing prior knowledge of these parameters before calibration; n(0, a obs , <fi, <p\D) is the 
joint posterior (or calibrated) PDF of 0, cr obs , 4> and ip; the joint likelihood function of 0, a obs , 
4> and (p, which is denoted as C(0, a obs , (f>, ip), is proportional to the conditional probability 
of observing the data D given these parameters. Note that 7r(*) denotes probability density 
function in this paper. 

The likelihood function can be computed based on the construction of a Bayesian net- 
work. A Bayesian network is a directed acyclic graph formed by the variables (nodes) 
together with the directed edges, attached by a table of conditional probabilities of each 
variable on all its parents Therefore, it is a graphical representation of uncertain quan- 
tities that explicitly incorporates the probabilistic causal dependence between the variables 



as well as the flow of information in the model [T7]. Fig. 1(b) shows the graph of the 



Bayesian network for the example model and experimental observation described in Eq. [4] 



and Fig. 1(a) Herein we consider the experimental observation yjj corresponding to a given 
experimental input setting random variable, and the actual observation data point 

D is a random realization of yo- The relationship between yr> and the model output y m can 
be written as 

Ud = y m + 8 + e obs (4) 
Based on the constructed Bayesian network and the chain rule in probability theory, the 
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Figure 1: (a) the illustrative diagram of a computer model, and (b) the corresponding Bayesian network 
joint likelihood function of 0, cr obs , 4> an d <£> can be derived as 



£(0,a obs ,ct>, if) oc (5) 
n(y D = D\y m , e obs , 5)n(y m \x D , 0)n(e obs \a obs )ir(5\x D , <f>, ip)n(x D )dy m de obs d5dx D 



To compute C(0, a obs , (f>, cp) in Eq. [5j the conditional probabilities corresponding to the 
directed edges are needed. The PDF of the experimental inputs tc(x d ) is assumed known as 
stated at the beginning of this subsection. Recall that the measurement error e obs is assumed 
to be normally distributed, and thus Tt(s obs \a obs ) is a normal probability density function 
with zero mean and variance o 2 ohs which is evaluated at e obs . ti(8\xe>, <fi, (p) is also a normal 
probability density function, since the model discrepancy function S is approximated as a 
Gaussian process with index x^. Note that in this example, for given values of x^, and 0, the 
model output y m is deterministic; yn is also deterministic if the values of y m , e obs , and S are 
given. In general, inference in Bayesian networks with conditionally deterministic variables 
is not straightforward since conditional probability density functions such as 7i(y m \x D , 0) 
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and 7r(y£> = D\y m , 8, £ & s ) do not exist, and some related discussions and solutions can be 
found in [181 US]- I n this example, however, we can bypass the conditionally deterministic 
variables because of the assumptions that e obs and 5 are normally distributed given the 
corresponding parameters. That is to say, we can derive without computing the integration 
in Eq. [5] that (y D \x D ,0,a obs ,4>,ip) ~ M(G(x D ; 0) + m(x D ;4>),k(x D ,x D ;ip) + a 2 bs ), where 
m(xD](j)) and k(x d , a? £>; <£>) are the mean and covariance function of 5 respectively. Thus, 
C(6, <j b s , 0, <p) can be obtained as 



C(0,a obs ,<p,(p) oc / n(y D = D\x D ,6,a obs ,<p,(p) ir(x D ) dx D 



(6) 



The case discussed above considers calibration with a single data point. For the case that 
the experimental observations are taken for multiple input setting X d = [xdi, XDn], 
let y D = [ym, Vd2, VDn] be the vector of experimental observations corresponding to X D . 
Similarly, we can derive that 



(y D \X D} 0, a obs , 0, tp) ~ Af(iJ,y D ,E + a 2 obs I) 



(7) 



where 



G(x m ;0) +m(x m ;(f>) 



G(x Dn ; 0) + m(x Dn ; 0) 



k(x m ,x m ;<p) ... k(x m ,x Dn ;(p) 



k(x Dn ,x D1 ;<p) ■■■ k(x Dn ,x Dn ;(p) 



(8) 



Note that in Eq. |HJ for given values of X d, 0, cr obs , <fi and <£>, y D is a Gaussian random 
vector that represents the experimental observation corresponding to multiple input settings. 
Further, let D = [Di, D 2 , D n ] represent the actual observation data which are random 



S 



realizations of the random vector y D . In this case, C(6, a obs , <fi, <p) can be obtained as 



The calculation of the likelihood function is the key component in implementing Bayesian 
model calibration. In the cases discussed in this section, the likelihood function is con- 
structed based on measurement data reported as point values. In practical problems, var- 
ious types of experimental data may be available, including interval data and time series 
data. For these different types of data, the corresponding methods to calculate the likelihood 
function are presented in Section [3] 

3. Practical issues in implementing Bayesian calibration 

3.1. Different types of experimental data 
3.1.1. Bayesian calibration with interval data 

Due to the imprecision of measurement techniques and limited experimental resources, 
the measurement of many quantities is only available in the form of an interval, which brings 
in additional data uncertainty (i.e., the actual experimental value lies within an interval). 
Sankararaman and Mahadevan [20] developed a likelihood-based approach to quantify this 
type of uncertainty. In the example shown in Fig. [TJ the experimental data D may be 
reported as an interval, i.e., D = [D a ,D b ], and we can derive the corresponding expression 
for the likelihood function of unknown parameters based on the method developed in [20] 
as 



£(0, o- obs , 0, ip) oc / n(y D = D\X D ,6,a obs ,(f),(p) ir(X D ) dX 



D 



(9) 



C(6, a obs , 4>, ip) cx Vi{D a <y D < D b \6, a obs , cj>, cp) 

= / 7i(y D \x D ,0,a obs ,(f),(p) tt(x d ) dy D dx D 




(10) 
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where Tr(y D \x D ,6,a obs ,(f),p) is a normal PDF as discussed in Section [2] Note that the 
likelihood-based approach [20] is not limited to the case of normal distributions. If data are 
in the form of half intervals, i.e., yo > D a , the likelihood function can be obtained by letting 



D b = +00 in Eq. [To] Similarly, let D a = -00 if y D < D b . 

In some problems, measurements may be available at multiple input settings Xo = 
[xdi,Xe>2, ...XE> n ], and the data may be in the form of multiple intervals or a mixture of 
intervals and point values. We can conveniently extend Eq. [I0]to these two cases. Suppose 
the available data are now a set of intervals, i.e., D = {[Df, D\], [D%, D%], [£)*, £)„]}, 
which forms a n-dimensional hypercube fl n . The probability of observing the data is thus 
equivalent to the probability of the n-dimensional random vector y D = [yni, Vd2, VDn] 
lying inside the hypercube Q n . Hence, the likelihood function of unknown parameters can 
be derived as 



£(0,(T obs ,(t),<p) oc Pr(y D e fi n |0, cr o6s , 0, ip) 

ir(y D \X D , 0, a obs , 0, ip) n(X D ) dy D dX D (11) 

In the case that the available information is a mixture of k intervals and (n — k) point 
values, the k intervals form a fc-dimensional hypercube Let y^^ represent the elements 
of the random vector y D corresponding to interval data, and y D n _ k represent the rest of 
the elements corresponding to point data D point = [D n _ fc+1 , D n _ k+2 , ...,D n }. The likelihood 
function C(0, <T b s , <fi, <p) can be derived as 

C(0,a obs ,cj>, cp) oc Pr ((y D>k G fi (y D ^ k = D point )\0,a obs ,(f>,cp) 

n{y D ,k\yD,n-k = D point} X D , 9, a obs , 0, (p) 7v(X D ) dy D k dX D (12) 

where n(y D)k \yD,n-k = D poinU X D , 6, cr obs , 0, ip) is obtained by substituting y DyU _ k = D point 
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into the joint PDF of the random vector y D . 

3.1.2. Time series data 

In dynamic systems, information is commonly available in the form of time series data. 
This type of information leads to several additional challenges; in particular, the model 
prediction and the corresponding measurement data are dependent on the states of the 
system in the previous time steps, and replicate time series observations may be taken with 
a large number of time points. Both of these characteristics may complicate the computation 
of the likelihood function. To perform model calibration with time-series data, we again use 
the KOH framework discussed in Section [2j In general, a dynamic model can be written 
as y m} t = G(y m _ t ,x,t;0); y nitt represents the model prediction at time t; y m _ t represents 
the model predictions for the previous time steps; x and are the same as in Section |2j 
Note that y m<t is deterministic for given values of y m _ t , x, t and 6. The Gaussian process 
used to approximate the model discrepancy function also becomes time dependent, and 
thus 5 t ~ Af(m(x, t; 0), k(x, x', t, t'; <£>)). The measurement uncertainty is still represented 
as e a bs ~ A/"(0, cr^ bs ), which is time-independent. Similar to Eq. [IJ the relationship between 
experimental observation yu tt and the corresponding model prediction y m j can be written 
as 

yD,t = Vm,t + S t + S obs (13) 

The fact that y m j is dependent on y m _ t renders the construction of the likelihood func- 
tion C(6, a b s , 0, ip) difficult. For example, suppose we have one set of time-series data 
available D t = [D t i, D t2 , D tn ], i.e., the measurements are taken at several time points 
tD = ti, t2, t n when the values of the experimental inputs xp remain the same. Note that 
again we consider that the actual data D t are random realizations of the random vector 
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Vdi = [yD,tiiVD,t2, ■■■,yD,tn)- The corresponding likelihood function can be written as 



C{0, a obs , 0, tp) oc 7i{y D j = D t \0, a obs , (p, ip) 

J n(y D>t = D t \y m _ t , x D , 0, a obs , 0, tp) ir(y m _ t \x D , 6) n(x D ) dy m _ t dx D (14) 



Note that if strictly written, y m _ t in Eq. 14 should be different for different data points. We 
chose not to write out each "?/ m _ t " to avoid making the equation unnecessarily complex. 
It can be seen that the PDF of model predictions for the past time points 7t(y m _ t \xn, 0) 
and an integration over all the elements of y m _ t are needed, which makes the evaluation of 
the likelihood function analytically intractable, and numerically expensive methods (such as 
Monte Carlo simulation) may be needed. 

The extension to the cases where multiple sets of time series data are available is straight- 
forward in theory. The difference from the case of single time series data is that y D t , y m _ t 



and Xjj in Eq. 14 become matrices instead of being vectors. In the special case that the 
multiple series are replicates, i.e., we have repeated measurements at the same time points 
for the same set of inputs, the variation from one series to another can be attributed to 
the observation noise e obs , and thus we can directly compute e obs based on these repeated 
time series. Assuming that e obs is a zero-mean Gaussian random variable with variance a 2 ohs 
independent of time t, the variance a 2 ohs can be estimated as 



-| '"1 ><>£ -| <"A 

v ' 7=1 i=l i=l 



where is the number of repeated time series, and rt\ is the number of time points in each 
series. The estimated a obs and the average time series (l/n 2 ) Y^iliivbtj) can ^ e further used 
to compute the likelihood as in the single time series case. 

If the measurement uncertainty in the experimental inputs is negligible, xd can be treated 
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as constant. Note that y m _ t is deterministic for given values of Xp and 6 (initial condition 
is also considered as input). Then, the calibration with time series data D t becomes much 
simpler, as Eq. 14 can be simplified as 



C(0, cr obs , <fi, ip) oc n(y D t = D t \x D , 0, a obs , 0, ip) (16) 

where the calculation of 7i(yo,t = D t \xo, 0, <J b s , 0, <p) is very similar to the cases discussed in 
Section |2j and the related information can be found in Eq. [8] and the surrounding discussion. 

In the case that the time series data set is becoming available in real time during the 
operation of a system, i.e., the observation is made in real time and the model is used to make 
predictions for future time points, we can continuously calibrate (or update) the model using 
algorithms such as Kalman filter (for linear models), extended Kalman filter (for non-linear 
models), particle filter (sampling implementation of sequential Bayesian calibration) [2 1 j , 
etc. 

3.2. Identifiability of model parameters 

Before implementing the calibration of a model, it is often of interest to determine 
whether we can extract useful information from the calibration results. A model is non- 
identifiable if there are infinite "best" (depending on the criterion chosen) estimates for the 
model parameters. In the Bayesian model calibration framework, the typical sign of non- 
identifiabilty is that the posterior PDFs of some of the model parameters are close to the 
prior PDFs, which indicates that the marginal likelihoods of these parameters are nearly 
flat and there is an infinite number of maximum likelihood estimates of the model parame- 
ters. In general, model non-identifiability can be classified into two levels, namely structural 
non-identifiability and practical non-identifiability [22] . The first level of non-identifiability, 
structural non-identifiability is due to the redundant parameterization of the model struc- 
ture. Even if the model is structurally identifiable, a second level of non-identifiability, practi- 
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cal non-identifiability, can still arise due to the insufficient amount and quality of observation 
data. The quality of data is related to the bias and noise in the data due to the imprecision 
of measurement techniques. Successful detection of structural non-identifiability may help 
reduce model redundancy. Also, by detecting the existence of practical non-identifiability, 
analysts may be able to overcome this issue by developing better design of experiments or 
improving data quality [23J. 

It is usually straightforward to detect structural non-identifiability if the analytical ex- 
pression of a model is available; however, in many problems, the analytical expression of the 
model is not readily available. One example is a dynamic model without an explicit steady 
state solution. Another possible case occurs when we add a discrepancy function to the nu- 
merical solutions of some governing equations, which in fact forms a new model without any 
analytical expression to consider [23]. Various analytical and numerical methods have been 
developed to detect the structural non-identifiability of dynamic models j25H27], whereas 
the second possible case does not appear to have been studied in the literature. This section 
addresses this case. 

Since the second level of non-identifiability, practical non-identifiability, is related to 
both model structure and observation data, it is necessary to inspect the likelihood function 
in order to determine whether some parameters of a model are practically non-identifiable. 
In fact, rigorous definitions of model non-identifiability can be constructed based on the 
analytical properties of likelihood functions [28H30] . In addition to the theoretical analysis 
of likelihood functions, Raue et al. [22], [31] developed a numerical approach based on the 
concept of "profile likelihood" [32], which has been shown to be effective in detecting practical 
non-identifiability. When the analytical expression of the likelihood function is available, or 
its numerical evaluation is trivial, it may be preferable to apply the profile likelihood-based 
method and determine the practical non-identifiability directly. But this method becomes 
cumbersome when the construction of the likelihood function is computationally expensive, 
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since repetitive evaluations of the likelihood function are required to compute the profile 
likelihood. 

Given the above observations, we propose a first-order Taylor series expansion-based 
method, which can detect structural non-identifiability for models without analytical ex- 
pressions, and can detect practical non-identifiability due to insufficient amount of data. 
This method does not involve computation of the likelihood function, and thus is simpler to 
implement and less computationally demanding. The limitations of this method are: (1) it 
uses a linear approximation of the model, and hence may fail to detect non-identifiability if 
the model is highly nonlinear; (2) it can only detect local non-identifiability as the Taylor 
series expansion is constructed based on the derivatives at a single point; (3) it does not 
apply to statistical models; and (4) it does not cover practical non-identifiability due to the 
quality of data. 

Suppose the physics model to be calibrated is y m = G(x;0), and a Gaussian process 
discrepancy function 5 ~ Af(m(x; <fi), k(x, x'; ^)) is added to the model. Thus, a new model 
is formed as G new (x; if)) = G(x; 0) + m(x; <fi), where if) = [0, 0] includes the parameters of 
the physics model (0) and the parameters of the mean of the discrepancy function (0). In 
the case that the measurement noise is a zero mean random variable, G new is the expectation 
of observation y^ according to Eq. |4| Further assume that the analytical form of G new is 
not available. In such cases, the first-order Taylor series expansion of this model (as shown 



in Eq. 17) can be used as an efficient approximation when the model is believed to be not 



highly nonlinear: 

E[y D ] = G new (x D ; ?/>) ~ G new (x D ; ip) + ^ 



dipi 



> (17) 



where if) can be the mean value of the prior of if), and p is the number of model parameters. 
Suppose there are n data points available, i.e., experimentally observed values D = 
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[Di , D 2 , • • • , D n ] corresponding to different input settings X D = [xm , Xd 2 , ■ ■ ■ , x D n ] ■ Without 
considering measurement uncertainty and the variance of the model discrepancy function, 
we can construct a linear system as 



A^ 1 



D, A 



dG n 



dGr, 



dipi \x D1 ,ip 



i)G 

new 

dip! \x Drl ^ 



dip p \x D1 ,ip 



dG 

new 



The linear system in Eq. [18] can be underdetermined or determined, depending on the 
rank of the matrix A (denoted as r a). If < p, the system is underdetermined and there 



will be an infinite number of i\) values satisfying Eq. 18 if = p, the system is determined 



and there will be a unique vector i/? satisfying Eq. [18} The latter case suggests that the 
model is practically identifiable given the available data points (assuming the quality of the 
data does not cause non-identifiability). The former case suggests that the model is non- 
identifiable either due to the model structure or insufficient data. If the inequality ta < p 
continue to hold as we increase the number of observation data, then it can be inferred that 
the model is structurally non-identifiable. 

In order to help researchers reduce model redundancy once a model is detected as struc- 
turally non-identifiable, it may be of interest to know which set of parameters can/cannot 



be identified. Using the formulation of the linear system in Eq. [18j we can retrieve this 
information by checking the linear dependency between the column vectors of the matrix 
A, since the i-th column of A corresponds to the parameter ipi. For example, if the z-th col- 
umn vector di and the j-th column vector a,j are linearly dependent, it is apparent that the 



corresponding parameters ipi and ipj are non-identifiable using the linear model in Eq. 17 



We can also find one set of identifiable parameters using the simple algorithm below (note 
that there may be multiple sets of identifiable parameters) 
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Algorithm 1 Find one set of identifiable parameters 
Input: The first-order derivative matrix A 
Output: The index set of identifiable parameters J 

Atemp A 

I = empty set 
for % = 1 to p do 

r\ = the rank of A temp 

Remove the i-th column from A temp 

T2 = the rank of A temv (with the 2-th column removed) 

if r\ > T2 then 

Add the value of % to the set I as an element 

end if 
end for 
return J 



To illustrate the proposed method, consider the following two test models: 
Model 1: y m = (fa + fa)xi + Z^i^l + V'i 
Model 2: y m = (fa + ^2)^1 + 2^33:2 

It is not difficult to see from the model expressions that the three parameters of model 
1 are identifiable given no less than three observations of y (the physical quantity to be 
predicted using model 1 and model 2) corresponding to different combinations of the inputs 
x\ and X2, whereas the three parameters of model 2 are not identifiable no matter how many 
data points are available. But in order to illustrate the proposed method, we assume that 
the analytical expressions of these two models are unavailable. 

Suppose the measurement data of y are taken at three points: [xi, x 2 ] l D = [5, 2], [x±, X2W, = 
[6,3], [ X11X2W) — [7,1], we can calculate the derivatives dy m /dijji numerically (e.g., forward 
difference or central difference) at these input points for given values of the parameters, and 
thus obtain the matrix A. For example, for [ipi, fa, fa] = [2,3,4], the matrix A for model 
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1 and model 2 are as follows 
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The rank of A\ is equal to 3, whereas the rank of A2 is equal to 2. Thus we can infer 
that the parameters of model 1 are identifiable, but the parameters of model 2 are not 
identifiable. We can also use the program in Algorithm [T] to infer that ip2 and ^3 of model 
2 are identifiable if the value of ipi is given. 

3.3. Computational issues 

Bayesian calibration in Eq. [3] can be computationally expensive due to two reasons: (1) 
the likelihood function may be expensive to compute numerically, and (2) the multivariate 
integration in the denominator of Eq. [3] can be time consuming if the number of parameters 
is large. 

The use of a Gaussian process to quantify the model discrepancy term as shown in 
Section [2] can lead to a high dimensional-parameter space, as a set of parameters <p and ip 
which characterize the Gaussian process also needs to be estimated in addition to the actual 
physics model parameters 6. For the case that the data points are sparse, it may not be 
feasible to calibrate the model along with the estimation of these parameters of the Gaussian 
process. A compromised solution is to use a simplified model discrepancy function with less 
flexibility, i.e., a smaller number of parameters. Another possible method is to estimate the 
model parameters and the parameters of the Gaussian process 4> in two sequential steps. 
First, the model parameters 6 are calibrated without considering model discrepancy. Then, 
we can estimate and (p based on the a posteriori estimate of (denoted as 0*), i.e., we 
can obtain the posterior PDF of 4> and ip, tt(4>, <p\0*), which is conditioned on 6*. 
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The likelihood function represents the probabilistic relationship between measured data 
and unknown parameters, and repeated runs of the computer model G(x; 6) are required to 
compute this relationship. Hence, previous studies have mostly focused on approximating 
the computational model with a surrogate model JTJ [7J, i.e., replacing the physics-based 
model G(x; 6) with a faster model without losing much accuracy. Surrogate modeling tech- 
niques that have been developed in literature include Kriging or Gaussian Process (GP) 
interpolation [33], polynomial chaos expansion [31H36] . support vector machine (SVM) [37] . 
relevance vector machine [38], adaptive sparse grid collocation [39J, etc. Then, the likeli- 
hood function of the parameters can be evaluated based on executing the surrogate model 
a number of times. 

If the measurement uncertainty is the only source of uncertainty considered and can be 
represented using a Gaussian random variable, the likelihood function can be calculated 
analytically based on the model predictions. However, in the case that various sources of 
uncertainty exist (e.g., natural variability in the input x, data uncertainty in input and 
output measurement, and model uncertainty), the likelihood function is no longer simple to 
compute (e.g., Eq.[5]). In that case, sampling methods like Monte Carlo simulation are needed 
to compute the function for given parameter values. If the number of calibration parameters 
is relatively large, the evaluation of the likelihood function can become expensive even with 
a fast surrogate model for G(x;0). In such cases, another surrogate model can be built 
to directly approximate the joint likelihood function of all the parameters, based on actual 
evaluations of the likelihood function for selected values of the parameters. Thereafter, we 
can evaluate this surrogate model, instead of the actual likelihood function, in the calculation 
of the posterior PDFs, which can speed up Bayesian calibration under multiple sources of 
uncertainty. For example, Bliznyuk et al. [101 approximated the unnormalized posterior 
density (the product of likelihood function and prior density) using radial basis functions. 

If the number of parameters is relatively small, the integration of the product of the 
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likelihood function and the prior PDFs of parameters can be conducted accurately and effi- 
ciently using numerical integration methods, such as Gaussian quadrature or the trapezoidal 
rule. When the number of parameters becomes large, Markov Chain Monte Carlo (MCMC) 
methods are widely used due to the relative insensitivity of the computational effort to 
the number of parameters. MCMC methods do not conduct the integration explicitly, but 
instead directly generate the random samples from the unnormalized posterior density of 
the parameters, upon convergence. Several algorithms are available for MCMC sampling, 
including Metropolis-Hastings [4THI5], Gibbs [IB], slice sampling [[47], etc. 

4. Calibration of multi-physics computational models 

Multi-physics modeling usually involves the combination of several models from differ- 
ent individual physics analyses. Ideally, these models would be calibrated separately with 
input-output experimental data corresponding to individual models. But in practice the 
experimental data may not be sufficient or available for some of the models. To calibrate all 
the models with limited information, a Bayesian network-based method is proposed below. 
The techniques discussed in Section [3] will be employed. 

4-1. Integration of multi-physics models and experimental data via Bayesian network 

Suppose we have two physics models y m \ = Gi(xi, &i, 612) and y m2 = G 2 (x2] 62, #12)- 
Note that these two models have different input variables (x\ versus x 2 ) and parameters 
(0i versus 2 ), but they also share some common parameters #12. Based on the discussion 
in Section |2j two Bayesian networks can be constructed for these two models individually. 
Further, due to the existence of the common parameters, these two networks can be con- 
nected to form a full network as shown in Fig. [2j which enables information flow from one 
network to the other. 
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Figure 2: Baycsian network for two physics models 

4-2. Strategy of Bayesian calibration for multi-physics models 

If both the observation data D x and D 2 are available, we have three options for model 
calibration based on the Bayesian networks in Fig. |2j as presented below. 

The first option is to calibrate the two models simultaneously. Let $1 and $ 2 repre- 
sents the calibration parameters of the two networks respectively except for the common 
parameters U , i.e., $1 = [O 1 ,cr obsl ,(f) l ,(p 1 \, $ 2 = [02,o- obs2 ,<p 2 ,(p 2 ]. 

ir{§i,§2,0i2\ym = D 1 ,y D2 = D 2 ) (19) 

_ n(yDl = Dl,VD2 = £> 2 |$1,$2,012) ?r($i) 7t($ 2 ) 7t(0 12 ) 

/ ir(y m = D u y D2 = D 2 |*i, *2,0 12 ) ^(^i) *"(*2) tt(6> 12 ) d$! d$ 2 d6» 12 

From the first option, we can obtain the posterior PDFs of all the parameters using both 
Di and D 2 . However, this option can be computationally expensive because of the high 
dimensional parameter space. 

The second option is to let information flow from left to right, conducting a two-step 

calibration. Following the procedure of Bayesian calibration for a single model presented in 

Section[2| $1 and Q\ 2 are first calibrated using the observation data D\. Then, the marginal 

posterior PDF of U , n(0i 2 \y D i = Di), is used as the prior when we calibrate the parameters 
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of the other network (<& 2 and #12). Applying Bayes' theorem, we have 



7T 



(0 12 \y D1 =D 1 ) 





$1,^12) vr($i) n{0 12 ) d$i 


J K(y D1 = L>i 


$i,#i 2 ) vr($i) 7r(0 12 ) d$i d0i2 



•K'{$ 2 ,0 12 \y D1 = Dl,VD2 = D 2 ) 

_ ^{UD2 = D 2 \® 2 ,0i 2 ) 7r($ 2 ) 7 ^fii2\ym = Pi) 
Jn(y D2 = D 2 \® 2 ,0 12 ) 7r($ 2 ) n(0 12 \yDi = £>i) d$ 2 d0 12 



(20) 



We can prove that Eq. 20 gives the same joint posterior PDF of $ 2 and 0i 2 as from 



Eq. [T9j By combining the two expressions in Eq. 20 , we have 
n'(® 2 ,0 12 \y D1 = D 1 ,y D2 = D 2 ) 



n(yD2 = D 2 


\<f> 2 ,0 l2 )7r(<& 2 ) J iz{y m = D x 


$i,0i2)7r($i)7r(0i 2 )d$i 


( / K(y D2 = D 2 , $ 2 , OuIvdi = £>i)d$ 2 d0 12 N 


(fn(y D i=D 1 


$ 1 ,6> 12 )7r($ 1 )7r(6» 12 )d$ 1 d6» 12 ) 



%(y D i = Di|$i,0i 2 ) 7r(yj,2 = £> 2 |$2,0r2) tt($i) tt($ 2 ) 7r(gig) , $ 
^(VD2 = D 2 \y m = Di) ir(y m = D x ) ~ ' ' 1 

-Kjyoi = Di,y D2 = D 2 \$>i,$ 2 ,0i 2 ) 7r($i) tt($ 2 ) tt(6>i 2 ) d$ 
7r(yu2 = D 2 \ym = D x ) n(y m = D\) 

= J Tr{&i,&2,0 12 \y D i = D u y D2 = D 2 ) d*i (21) 

Therefore, the second option provides us the posterior PDF of $1 based on D\, and 
the posterior PDFs of $2 and 6i 2 based on both Di and D 2 . Note that the computational 
effort in the second option can be much smaller than in the first option, due to the reduced 
number of parameters in each step of the calibration. 

The third option is similar to the second one, except that the information flows from 
right to left, i.e., $2 arid 0i 2 are first calibrated using the observation data D 2 , and then 
the marginal posterior PDF of 0i 2 , 7r(0i 2 \yD2 — D 2 ), is used as prior in the calibration of 
$ 2 and #12 using the data Di. Hence, the posterior PDFs of $1 and 0i 2 are obtained using 
both Di and D 2 , whereas the posterior PDF of <fr 2 is only based on D 2 . 

Note that although the above Bayesian network-based method is presented using a two- 
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model problem, it can be extended to the cases that N(N > 2) physics models are to be 
calibrated with limited information, and there will be up to iV! + 1 options of calibration 
available, depending on the existence of common parameters between different models. 

4-3. Summary of contributions 

In Section |2j we presented the basic theory of model calibration using Bayesian network 
following the KOH framework. In Section |3j methods were developed to address various 
issues in the practical applications of Bayesian model calibration: (1) calibration using 



interval data or time series data (Section 3.1); (2) identifiability of model parameters (Sec- 
tion 3.2 and (3) computational difficulty and possible solutions (Section 3.3). In Section |4j 
a Bayesian network-based approach was developed for the calibration of multiple physics 
models. It is shown that the available data can be used more efficiently, since the data 
of different physical quantities can be exchanged through the Bayesian network and thus 
we can calibrate the parameters of one model using information from experiments that are 
related to other physics models. 

5. Numerical examples 

We illustrate the methods presented in the previous sections using two numerical ex- 



amples. In Section 5.1[ we calibrate a transient dielectric charging model [JS] with time 

we 



series data to illustrate the Bayesian approach discussed in Section 3.1.2 In Section 5.2 



implement Bayesian calibration of multi-physics models, based on the methods presented 



in Section 3.1.1 and[4j Two types of radio-frequency (RF) microelectromechanical system 
(MEMS) devices are used in this example. Examination of model parameter identifiability 
is performed in both examples using the first-order Taylor series expansion-based method 



proposed in Section 
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5.1. Calibration of a dielectric charging model with time series data 

Dielectric charging has been identified as an important failure mechanism of RF MEMS 
switches, causing the switches to either remain stuck or fail to actuate (IHl [50]. Since our 
focus is on the calibration method instead of the physics aspects of dielectric charging, 
we only describe what is related to calibration (model inputs, unknown model parameters, 
model output, and observation data). Details of the mechanism and model development are 
provided in [3S1 H9] . The model has two input variables (voltage V, temperature T), six 
unknown parameters (trap density Nt, barrier height capture cross section a, Frenkel- 
Poole (FP) attempt frequency 7, high frequency dielectric constant £inf, and effective mass 
m*), and a single output variable (transient current density J t at time t). Experiments 
were conducted on a 200-nm silicon nitride (SiaN^ dielectric with 2 mm*2 mm area for 12 
different combinations of V and T, and these experiments were repeated for four times. The 
transient current density was measured at about 190 discrete time points between and 100 
seconds. Therefore, a large data set with size n w 12 * 4 * 190 = 9120 is available, which is 
typical for time-dependent problems. Since we have four repeated time series observations 
for each combination of inputs (V and T), and each series contains measurements at the 
same time points, the replicates can be utilized directly to compute the measurement noise 



statistics as shown in Eq. 15 In this example, the number of time points in one series 
ni ~ 190, and the number of repeated time series n2 = 4. 



As presented in Section 3.1.2, the model discrepancy term is approximated by a time- 
dependent Gaussian process (GP) discrepancy function 5 t ~ J\f(m(x, t; 0), k(x, x', t, t'\ (p)). 
In this example, we select the form of the mean function and covariance function based on 
a heuristic approach. First, the parameters of the dielectric charging model are estimated 
using the method of least squares without considering any uncertainty source. Then, we 
compute the difference between model predictions (based on least squares parameter esti- 
mation) and the corresponding experimental data. By plotting the difference versus the 
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input variables, we obtain a rough idea about the trend of the model discrepancy function. 
In this example, we observe three issues: (1) the discrepancy between model prediction and 
observation appears to vary exponentially with respect to time; (2) there is significant statis- 
tical correlation in time; and (3) the variance of the discrepancy varies with time. Therefore, 
we adopt the combination of a linear function of model inputs and an exponential function 
of time to model the mean of the Gaussian process discrepancy function. In addition to 
the squared exponential covariance function shown in Eq. [2] (denoted as fci), a time- variant 
term ^(i; w) = W\ exp(— w 2 t) is also used in order to account for the non-stationary trend 
of variance with respect to time. Thus the selected mean and covariance functions are: 



m(x, t; 4>) = <f>iV + (p 2 T + (p 3 t + 4 exp(0 5 £) 

k\{x,x',t,ti',(p^), t^t' 
k(x. x'. /-./-»= { (22) 

kx{x,x',t,-t?\ip x ) + k>2{t\w), t — t' 

where the parameters of the covariance function are ip = [ip 1 , w], and (p l = [A, l\, 1%, 1%]. 

Note that a check of identifiability is needed after selecting the mean function m(x, t; <p), 
since the addition of the discrepancy function to the original model may cause non-identifiability. 
In this example, there are 17 unknown parameters to calibrate, i.e., p = 17, and the corre- 
sponding matrix A is full rank, which suggests that the combination of the dielectric model 
and the discrepancy function is identifiable. In fact, the reason that there is no constant 
term in the linear mean function is because the constant term is not identifiable according 



to the first-order Taylor series expansion-based method developed in Section 3.2 



Once the forms of the above functions are selected, the joint likelihood function of the di- 



electric charging model parameters and the parameters of St is then formulated as in Eq. [16 
Note that the likelihood is proportional to the joint probability of all observations condi- 
tioned on model inputs and parameters, and thus the construction of likelihood requires 
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computing the determinant and inverse of the covariance matrix of these data points. If all 
the data points are used, the size of the covariance matrix will rise to around 2280 * 2280, 
which can cause several numerical difficulties, including matrix singularity and expensive 
computation of matrix determinant. We bypass such numerical difficulties with the large 
set of time series data by including only a subset of the time points in the likelihood construc- 
tion. These points are selected in a manner which reflects all the features of the dynamic 
response as closely as possible; however, the precise number of points is largely a matter of 
computational convenience. In this case, we select measurements at 8 time points from each 
data series, and thus the size of the covariance matrix reduces to 96 * 96. Note that more 
advanced methods, which approximate the original covariance matrix with a sparse matrix 
while taking into account the whole data set, can be found in [5Tj . 

Due to the high number of unknown parameters (= 17), we use the Metropolis-Hastings 
MCMC algorithm to sample these 17 parameters from their posterior probability distribu- 
tion. Note that we use uniform priors for all the parameters, since no information on the 
prior distributions is available except for the possible ranges of these parameters. The scaled 
histograms and the kernel density estimation (KDE) of posterior PDFs based on 10 6 samples 
are shown in Fig. [3} Note that I3 is the length-scale parameter corresponding to time t, and 
the posterior PDF of I3 indicates significant statistical correlation in time, which is what we 
expected. 

With the calibrated model and Gaussian process discrepancy function, we can predict the 
current density as J t = y m ,t + St- A comparison between the prediction and the unused set of 
data can help validate the calibrated model and discrepancy function. In this example, we 
compute current density using the maximum a posteriori (MAP) [52J estimation of unknown 
parameters at the 12 combinations of inputs V and T. An example comparison between 
prediction and data is shown in Fig. [5J where E[y m>t ] is the expected current density based 
on the calibrated model without discrepancy correction, E[y m j + S t ] is the expected current 

26 



(a) to* 



J 



-J 


t 


(b) i 




FT" 






(c) 7 



(d)a 



(e) £m/ 



(f) N T 



(g) 0i 



(h) 02 



(i) 03 



0) 



GO 05 



(1) A 



(m) h (n) Z 2 (o) h (p) (q) w 2 

Figure 3: Plots of the scaled histogram and posterior PDF of parameters 
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Figure 4: Validation of the calibrated model 
density based on the combination of the calibrated model and discrepancy function. The 



blue dash line in Fig. 4(b) is the model prediction based on least squares estimate of the 
parameters. 

We observe from this graphical comparison that E\y m j + dt] fits the data relatively better 
than E[y mjt ] and the prediction based on least squares method, and that the one standard 
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deviation (68% probability) bound (E[y mtt + 5 t ] ± \fV[y m j + 8 t ], where V[y m> t + S t ] is the 
variance of prediction) fully covers the data. It should be noted that the prediction based on 
Bayesian calibration accounts for model uncertainty; data uncertainty can also be taken into 
account if the full posterior PDFs of parameters are used. However, Bayesian calibration 
based on MCMC sampling methods is more computationally demanding compared with 
calibration based on least squares analysis, since the generation of 10 6 samples require the 
same amount of function evaluations. It can also be observed that there is more difference 
between the expected current density E[y m ^+S t ) and the data at the early time range (0 ~ 20 
seconds) than at the later time range, which is reflected by the wider probability bound at 
the early time range. This observation indicates that the combination of the physics model 
and the discrepancy is not sufficient to model the current density for the whole time range, 
and the actual model discrepancy may be a more complicated function with respect to time. 

5.2. Calibration of multi-physics models using interval and point data 



is 



The target MEMS device of this example (denoted as Dev-1) shown in Fig. 5(a 
used as a switch. The membrane deflects under some applied voltage, and will contact the 
dielectric pad when the applied voltage exceeds a certain threshold. This threshold voltage 
is called pull-in voltage (V p i), and the device will be closed when the contact occurs. Pull- 
in voltage is an important metric in the reliability analysis of the device after a certain 
period of usage. Several models are needed to calculate the pull-in voltage, namely dynamic 
model, electrostatic model, damping model, and creep model. A 1-D Euler-Bernoulli beam 
model is used to simulate the dynamic behavior of the MEMS device [53] • The electrostatic 
model takes applied voltage and air gap (g) as inputs, and calculates electrostatic loading as 
output. The damping model considers the gas pressure and air gap, and the corresponding 
damping force is computed [51] • The electrostatic loading, damping force, device geometry, 
material property, boundary condition, and time are the inputs of the dynamic model. The 

creep model calculates the plastic deformation of the device under long-term loading, and is 
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coupled with the dynamic model. The unknown parameters include Young's modulus (E) 
and residual stress (a rs ) in the dynamic model, and the creep coefficient A c in the Coble 
creep model [551 ES]- To predict the pull-in voltage, an iterative method is used by varying 
the values of applied voltage, and calculating the resulting maximum deflection of the beam. 
The pull-in voltage is equal to the minimum value of applied voltage that causes the beam 
to be in contact with the dielectric pad. 




(a) Dev-1: Contacting capacitive RF MEMS switch (b) Dev-2: RF MEMS varactor 



Figure 5: Example RF MEMS devices (Courtesy: Purdue PRISM center) 



5.2.1. Different data on two devices 

Due to the limitation of experimental resources, currently only the measurement data 
of pull-in voltage at an early time point is available, and the data are collected on 17 Dev- 
1 devices with different geometries and initial positions. Because the pull-in voltage data 
are obtained by keeping increasing the applied voltage by 5 volts until the switch becomes 
closed, the data are reported in the form of intervals. 

Study of creep modeling has been separately performed for another type of device (de- 
noted as Dev-2, which has different boundary conditions from Dev-1 as shown in Fig. |5(b) ), 
and measurements of device deflection under constant voltage for a relative long time period 

(~700 hours) are available. Since these two types of devices are made of the same material, 
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the material-related parameters E and A c can be considered as the same. A polynomial 
chaos expansion (PCE) surrogate model is constructed based on 3-D membrane simula- 
tion for Dev-2, with E and A c as inputs and the deflection at three different time points 
t = [200, 400, 600] hours as output, i.e., g t i = PCE(A C , E) + 5 2 - S 2 is the model discrepancy 
term. 




Figure 6: Bayesian network 



Based on the aforementioned models and data, we construct a Bayesian network as shown 
in Fig. [6] Note that A c is not directly related to pull-in voltage, since the calculation of 
pull-in voltage at a given time point only requires dynamic simulation within microseconds, 
and creep is negligible in such a short time period. Therefore the only common parameter 
between the two physics models is E. The second and the third options presented in Sec- 



tion |4.2| are both implemented for the purpose of comparison in this example, although the 
first option is not considered due to its higher computation cost. 

The identifiability of the calibration parameters in the Bayesian network given the avail- 
able experimental data is checked using the first-order Taylor series expansion-based method 



presented in Section 3.2 Since the measurement data of pull-in voltage for 17 Dev-1 devices 

30 



will be directly used to calibrate the parameters (E, o~ rs , 5i) in the left half of the Bayesian 
network in Fig. |6j we obtain a 17 * 3 first-order derivative matrix A with rank ta = 3, i.e., 
E, a rs , and 5i are identifiable with these 17 data points of pull-in voltage. We also examine 
the identifiability of parameters E, A c and 82 in the right half of the Bayesian network with 
the deflection data of Dev-2 at the three test time points (200, 400, and 600 hours). In this 
case, the size of the matrix A is 3 * 3 and the rank of A is 3, which indicates that E, A c 
and 82 are all identifiable with the deflection data. Note that this method is not applicable 
for <J bsi and <J bs2, since the standard deviations of measurement noise are the parameters 
of statistical models as stated in Section 13.21 



5.2.2. Calibration with information flowing from left to right in the Bayesian network 



Following the second option presented in Section 4.2 , the left half of the Bayesian network 



is considered first, i.e., the parameters E, cr rs , 81, and cr b s i are calibrated using the pull-in 
voltage data. The prior and marginal posterior PDFs of E, a rs , 8\, and a Q b s i are plotted in 
Fig. [7} The prior PDFs are shown as red dash lines, whereas the posterior PDFs are shown 



as black solid lines (the same format applies to Figs. [8j |9j and 10). The corresponding 
statistics are shown in Table [TJ Note that all the prior PDFs used in this example are 
assumed to be uniform, except for the prior PDF of the common parameter E in the second 
step calibration, which is the posterior PDF obtained in the first step calibration. 




(a) E 



(b) a r 





(d) a , 



Figure 7: Calibration of parameters using pull-in voltage data 



Then, the parameters in the right half of the Bayesian network are calibrated using the 
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Table 1: Prior and posterior statistics of parameters (with data on Dev-1) 



Mean Standard deviation 





Prior 


Posterior 


Prior 


Posterior 


E (GPa) 


196.0 


195.5 


5.78 


5.76 


a rs (MPa) 


10.00 


9.07 


23.10 


4.76 


5 1 (Volt) 





-4.27 


20.22 


10.35 


°obsi (Volt) 


15.50 


13.16 


8.38 


2.98 



deflection data of Dev-2, and the posterior PDF of E obtained in the first step is used as 
prior. Fig. [8] shows the prior and marginal posterior PDFs of E, A c , 82, and a bs2, and 
Table [2] contains the corresponding statistics. 




(a) E (b) A c (c) 8 2 (d) a obs2 



Figure 8: Calibration of parameters using deflection data 



Table 2: Prior and posterior statistics of parameters (with data on Dev-2) 



Mean Standard deviation 





Prior 


Posterior 


Prior 


Posterior 


E (GPa) 


195.5 


194.7 


5.76 


5.51 


A c 


5.50e-7 


5.85e-7 


1.44e-7 


1.35e-7 


8 2 (pm) 


-0.050 


-0.057 


0.087 


0.040 


a obs2 W 


0.075 


0.026 


0.043 


0.027 



5.2.3. Calibration with information flowing from right to left in the Bayesian network 

Following the third option presented in Section |4.2[ the sequence of calibration in the 

previous section is now reversed. First, the calibration parameter (E, A c , 62, and 0~ o bs2) m 
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the right half of the Bayesian network in Fig. [6] are calibrated with the deflection data of 
Dev-2. The prior and marginal posterior PDFs, and the corresponding statistics are shown 
in Fig. [|and Table [3} 

x10 B x1Q H xIO 7 




(a) E (b) A c (c) 6 2 (d) a obs2 



Figure 9: Calibration of parameters using deflection data 



Table 3: Prior and posterior statistics of parameters (with data on Dev-2) 



Mean Standard deviation 





Prior 


Posterior 


Prior 


Posterior 


E (GPa) 


196 .0 


195.1 


5.78 


5.57 


A c 


5.50e-7 


5.88e-7 


1.44e-7 


1.35e-7 


8 2 (//m) 


-0.050 


-0.054 


0.087 


0.040 


a obs2 (/im) 


0.075 


0.025 


0.043 


0.027 



Similarly to the previous section, the posterior PDF of the common parameter E obtained 
in the first step of calibration is used as prior, and the parameters in the left half of the 
Bayesian network are calibrated using the pull-in voltage data of Dev-1. The calibration 
results can be found in Fig. 10 and Table [T] 



5.2.4- Discussion 

In this example, the posterior PDFs of the parameters are computed directly using 
trapezoidal integration rule as only 4 parameters need to be calibrated at one time. Uniform 
grids are used for the numerical integration over the parameters, and the number of grid 
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(a) E (b) <r rs (c) 5i (d) a obsl 



Figure 10: Calibration of parameters using pull-in voltage data 

Table 4: Prior and posterior statistics of parameters (with data on Dev-1) 

Mean Standard deviation 

Prior Posterior Prior Posterior 

E (GPa) 195.1 194.7 5.56 5.51 

a rs (MPa) 10.00 9.08 23.10 4.76 

Si (Volt) -4.22 20.22 10.35 

a obsl (Volt) 15.50 13.16 8.38 2.98 



points for each parameter is selected based on the convergence of the posterior density 
computation. By comparing Figs. [8] and [9j or Tables. [2] and [3j we observe that the second 
and the third options give similar posterior PDFs and statistics of the calibration parameters. 
The same observation can be drawn from the comparison between Figs. [7]and 10 or Tables. [T] 



and[4| This is due to the fact that the posterior PDFs of the common parameter E obtained 
in the first step of these two options are not significantly different from the uniform prior 
PDFs. Hence, the calibration in the second step, which uses the posterior PDF of E obtained 
from the first step as prior, will give similar results to the case that the uniform prior PDF 
is used. The relatively small difference between the prior and posterior PDFs of E indicates 
that the available experimental data are insufficient to reduce significantly the uncertainty 
about E. In addition, it can be observed from Tables. [2] and [4] that both the second and the 
third options give the same posterior statistics of E after the two-step calibration, which is 
expected since in theory both options should give n(E\Di, D 2 ) as the calibrated PDF of E 
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(D 1 denotes the pull-in voltage data of Dev-1, and D 2 denotes the deflection data of Dev-2). 
6. Conclusion 

A Bayesian network-based approach is proposed in this paper to integrate the calibra- 
tion of multi-physics computational models with various sources of uncertainty and available 
experimental data. Several issues in Bayesian calibration for practical applications are dis- 
cussed, including calibration with two different types of data - interval data and time series 
data, the identifiability of model parameters, efficient computation of the likelihood func- 
tion, and more efficient use of available data by exchanging information on multiple physical 
quantities through a Bayesian network. A first-order Taylor series expansion-based method 
is developed to determine the identifiability of model parameters, and it is especially ap- 
plicable to models with unknown expressions. This method can help to avoid the wasted 
effort on parameters that cannot be identified. Using a set of multi-physics models and data 
for two types of MEMS devices, we illustrate the Bayesian network-based approach, and 
the procedure of model calibration with efficient use of available information is presented. 
Future research efforts may include (1) applying the proposed approach to systems with 
more complicated structures (e.g., multi-scale, multi- level), and (2) reducing the computa- 
tional effort by exploring more efficient uncertainty quantification algorithms and parallel 
computing. 
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